Serveur d'exploration sur la TEI

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Des dictionnaires éditoriaux aux représentations XML standardisées

Identifieur interne : 000015 ( Main/Exploration ); précédent : 000014; suivant : 000016

Des dictionnaires éditoriaux aux représentations XML standardisées

Auteurs : Mathieu Mangeot [France] ; Chantal Enguehard [France]

Source :

RBID : Hal:hal-00959229

Abstract

Create an electronic dictionary from scratch is an expensive job because this task mobilizes over a long period, the work of skilled contributors, if not in lexicology, at least in linguistics. The use of specialized computer tools is essential for resources used by programs in natural language processing. When the socio-economic environment does not gather the necessary resources to the drafting of an electronic dictionary and printed dictionaries exist, these dictionaries are an important resource that can be used to initialize the creation of electronic lexical resources. This paper presents theoretical and practical aspects concerning the conversion of publishing dictionaries to electronic lexical resources. It takes into account the issue of limited economic resources, technology and the availability of qualified persons. Our field experiments concerns under-resourced languages mainly in Southeast Asia (Khmer, Malay, Vietnamese) and the Sahel (Bambara, Hausa, Kanuri, Tamajaq, Zarma), as most of the examples and socio-linguistic situations described in the paper relate to these areas. After a brief history devoted to the formats of electronic dictionaries (SGML, XML, XSLT and CSS), we present two standards that are dedicated to them (Text Encoding Initiative and Lexical Markup Framework). The issue of under-resourced languages is exposed and is followed by some examples concerning published dictionaries. The main technical challenges are detailed like the lack of standardization of the alphabets used and special characters (outside the traditional latin range). The conversion methodology is outlined and then detailed. The conversion to a bridge format in XML can be done by regular expressions or using specialized tools. Then, the bridge format is converted into the target format in LMF. The last part is dedicated to the consultation of resources through an online platform resource management.

Url:


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="fr">Des dictionnaires éditoriaux aux représentations XML standardisées</title>
<author>
<name sortKey="Mangeot, Mathieu" sort="Mangeot, Mathieu" uniqKey="Mangeot M" first="Mathieu" last="Mangeot">Mathieu Mangeot</name>
<affiliation wicri:level="1">
<hal:affiliation type="researchteam" xml:id="struct-49635" status="VALID">
<orgName> Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole </orgName>
<orgName type="acronym">GETALP</orgName>
<date type="start">2007</date>
<desc>
<address>
<addrLine>GETALP-LIG - 110 av. de la Chimie - Domaine Universitaire - BP 53 - 38041 Grenoble - cedex 9</addrLine>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-24471" type="direct"></relation>
<relation active="#struct-3886" type="indirect"></relation>
<relation active="#struct-51016" type="indirect"></relation>
<relation active="#struct-300275" type="indirect"></relation>
<relation name="UMR5217" active="#struct-441569" type="indirect"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-24471" type="direct">
<org type="laboratory" xml:id="struct-24471" status="VALID">
<orgName>Laboratoire d'Informatique de Grenoble</orgName>
<orgName type="acronym">LIG</orgName>
<desc>
<address>
<addrLine>UMR 5217 - Laboratoire LIG - 38041 Grenoble cedex 9 - France Tél. : +33 (0)4 76 51 43 61 - Fax : +33 (0)4 76 51 49 85</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.liglab.fr/</ref>
</desc>
<listRelation>
<relation active="#struct-3886" type="direct"></relation>
<relation active="#struct-51016" type="direct"></relation>
<relation active="#struct-300275" type="direct"></relation>
<relation name="UMR5217" active="#struct-441569" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-3886" type="indirect">
<org type="institution" xml:id="struct-3886" status="OLD">
<idno type="IdRef">02640432X</idno>
<orgName>Université Pierre Mendès France - Grenoble 2</orgName>
<orgName type="acronym">UPMF</orgName>
<date type="end">2015-12-31</date>
<desc>
<address>
<addrLine>BP 47 - 38040 Grenoble Cedex 9</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.upmf-grenoble.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-51016" type="indirect">
<org type="institution" xml:id="struct-51016" status="OLD">
<idno type="IdRef">026404796</idno>
<orgName>Université Joseph Fourier - Grenoble 1</orgName>
<orgName type="acronym">UJF</orgName>
<date type="end">2015-12-31</date>
<desc>
<address>
<addrLine>BP 53 - 38041 Grenoble Cedex 9</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.ujf-grenoble.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300275" type="indirect">
<org type="institution" xml:id="struct-300275" status="OLD">
<idno type="IdRef">026388804</idno>
<orgName>Institut National Polytechnique de Grenoble </orgName>
<orgName type="acronym">INPG</orgName>
<date type="end">2006-12-31</date>
<desc>
<address>
<addrLine>46 avenue Félix Viallet 38031 Grenoble Cedex 1</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="UMR5217" active="#struct-441569" type="indirect">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Grenoble</settlement>
<region type="region" nuts="2">Auvergne-Rhône-Alpes</region>
<region type="old region" nuts="2">Rhône-Alpes</region>
</placeName>
<orgName type="university">Université Joseph Fourier</orgName>
<orgName type="institution" wicri:auto="newGroup">Université de Grenoble</orgName>
</affiliation>
</author>
<author>
<name sortKey="Enguehard, Chantal" sort="Enguehard, Chantal" uniqKey="Enguehard C" first="Chantal" last="Enguehard">Chantal Enguehard</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-95421" status="VALID">
<orgName>Laboratoire d'Informatique de Nantes Atlantique</orgName>
<orgName type="acronym">LINA</orgName>
<desc>
<address>
<addrLine>LINA - Faculté des Sciences 2 rue de la Houssinière - BP 92208 44322 NANTES CEDEX 3</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.sciences.univ-nantes.fr/lina</ref>
</desc>
<listRelation>
<relation active="#struct-84538" type="direct"></relation>
<relation active="#struct-302102" type="indirect"></relation>
<relation active="#struct-93263" type="direct"></relation>
<relation name="UMR6241" active="#struct-441569" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-84538" type="direct">
<org type="laboratory" xml:id="struct-84538" status="VALID">
<orgName>Mines Nantes</orgName>
<orgName type="acronym">Mines Nantes</orgName>
<desc>
<address>
<addrLine>La Chantrerie - 4, rue Alfred Kastler - BP 20722 - 44307 Nantes cedex 3</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.mines-nantes.fr/</ref>
</desc>
<listRelation>
<relation active="#struct-302102" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-302102" type="indirect">
<org type="institution" xml:id="struct-302102" status="VALID">
<orgName>Institut Mines-Télécom</orgName>
<desc>
<address>
<addrLine>46 rue Barrault -75634 Paris Cedex 13</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.mines-telecom.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-93263" type="direct">
<org type="institution" xml:id="struct-93263" status="VALID">
<orgName>Université de Nantes</orgName>
<orgName type="acronym">UN</orgName>
<desc>
<address>
<addrLine>1, quai de Tourville - BP 13522 - 44035 Nantes cedex 1</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-nantes.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle name="UMR6241" active="#struct-441569" type="direct">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Nantes</settlement>
<region type="region" nuts="2">Pays de la Loire</region>
</placeName>
<orgName type="university">Université de Nantes</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:hal-00959229</idno>
<idno type="halId">hal-00959229</idno>
<idno type="halUri">https://hal.archives-ouvertes.fr/hal-00959229</idno>
<idno type="url">https://hal.archives-ouvertes.fr/hal-00959229</idno>
<date when="2013">2013</date>
<idno type="wicri:Area/Hal/Corpus">000029</idno>
<idno type="wicri:Area/Hal/Curation">000029</idno>
<idno type="wicri:Area/Hal/Checkpoint">000015</idno>
<idno type="wicri:explorRef" wicri:stream="Hal" wicri:step="Checkpoint">000015</idno>
<idno type="wicri:Area/Main/Merge">000015</idno>
<idno type="wicri:Area/Main/Curation">000015</idno>
<idno type="wicri:Area/Main/Exploration">000015</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="fr">Des dictionnaires éditoriaux aux représentations XML standardisées</title>
<author>
<name sortKey="Mangeot, Mathieu" sort="Mangeot, Mathieu" uniqKey="Mangeot M" first="Mathieu" last="Mangeot">Mathieu Mangeot</name>
<affiliation wicri:level="1">
<hal:affiliation type="researchteam" xml:id="struct-49635" status="VALID">
<orgName> Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole </orgName>
<orgName type="acronym">GETALP</orgName>
<date type="start">2007</date>
<desc>
<address>
<addrLine>GETALP-LIG - 110 av. de la Chimie - Domaine Universitaire - BP 53 - 38041 Grenoble - cedex 9</addrLine>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-24471" type="direct"></relation>
<relation active="#struct-3886" type="indirect"></relation>
<relation active="#struct-51016" type="indirect"></relation>
<relation active="#struct-300275" type="indirect"></relation>
<relation name="UMR5217" active="#struct-441569" type="indirect"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-24471" type="direct">
<org type="laboratory" xml:id="struct-24471" status="VALID">
<orgName>Laboratoire d'Informatique de Grenoble</orgName>
<orgName type="acronym">LIG</orgName>
<desc>
<address>
<addrLine>UMR 5217 - Laboratoire LIG - 38041 Grenoble cedex 9 - France Tél. : +33 (0)4 76 51 43 61 - Fax : +33 (0)4 76 51 49 85</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.liglab.fr/</ref>
</desc>
<listRelation>
<relation active="#struct-3886" type="direct"></relation>
<relation active="#struct-51016" type="direct"></relation>
<relation active="#struct-300275" type="direct"></relation>
<relation name="UMR5217" active="#struct-441569" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-3886" type="indirect">
<org type="institution" xml:id="struct-3886" status="OLD">
<idno type="IdRef">02640432X</idno>
<orgName>Université Pierre Mendès France - Grenoble 2</orgName>
<orgName type="acronym">UPMF</orgName>
<date type="end">2015-12-31</date>
<desc>
<address>
<addrLine>BP 47 - 38040 Grenoble Cedex 9</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.upmf-grenoble.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-51016" type="indirect">
<org type="institution" xml:id="struct-51016" status="OLD">
<idno type="IdRef">026404796</idno>
<orgName>Université Joseph Fourier - Grenoble 1</orgName>
<orgName type="acronym">UJF</orgName>
<date type="end">2015-12-31</date>
<desc>
<address>
<addrLine>BP 53 - 38041 Grenoble Cedex 9</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.ujf-grenoble.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300275" type="indirect">
<org type="institution" xml:id="struct-300275" status="OLD">
<idno type="IdRef">026388804</idno>
<orgName>Institut National Polytechnique de Grenoble </orgName>
<orgName type="acronym">INPG</orgName>
<date type="end">2006-12-31</date>
<desc>
<address>
<addrLine>46 avenue Félix Viallet 38031 Grenoble Cedex 1</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="UMR5217" active="#struct-441569" type="indirect">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Grenoble</settlement>
<region type="region" nuts="2">Auvergne-Rhône-Alpes</region>
<region type="old region" nuts="2">Rhône-Alpes</region>
</placeName>
<orgName type="university">Université Joseph Fourier</orgName>
<orgName type="institution" wicri:auto="newGroup">Université de Grenoble</orgName>
</affiliation>
</author>
<author>
<name sortKey="Enguehard, Chantal" sort="Enguehard, Chantal" uniqKey="Enguehard C" first="Chantal" last="Enguehard">Chantal Enguehard</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-95421" status="VALID">
<orgName>Laboratoire d'Informatique de Nantes Atlantique</orgName>
<orgName type="acronym">LINA</orgName>
<desc>
<address>
<addrLine>LINA - Faculté des Sciences 2 rue de la Houssinière - BP 92208 44322 NANTES CEDEX 3</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.sciences.univ-nantes.fr/lina</ref>
</desc>
<listRelation>
<relation active="#struct-84538" type="direct"></relation>
<relation active="#struct-302102" type="indirect"></relation>
<relation active="#struct-93263" type="direct"></relation>
<relation name="UMR6241" active="#struct-441569" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-84538" type="direct">
<org type="laboratory" xml:id="struct-84538" status="VALID">
<orgName>Mines Nantes</orgName>
<orgName type="acronym">Mines Nantes</orgName>
<desc>
<address>
<addrLine>La Chantrerie - 4, rue Alfred Kastler - BP 20722 - 44307 Nantes cedex 3</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.mines-nantes.fr/</ref>
</desc>
<listRelation>
<relation active="#struct-302102" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-302102" type="indirect">
<org type="institution" xml:id="struct-302102" status="VALID">
<orgName>Institut Mines-Télécom</orgName>
<desc>
<address>
<addrLine>46 rue Barrault -75634 Paris Cedex 13</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.mines-telecom.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-93263" type="direct">
<org type="institution" xml:id="struct-93263" status="VALID">
<orgName>Université de Nantes</orgName>
<orgName type="acronym">UN</orgName>
<desc>
<address>
<addrLine>1, quai de Tourville - BP 13522 - 44035 Nantes cedex 1</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-nantes.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle name="UMR6241" active="#struct-441569" type="direct">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Nantes</settlement>
<region type="region" nuts="2">Pays de la Loire</region>
</placeName>
<orgName type="university">Université de Nantes</orgName>
</affiliation>
</author>
</analytic>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="mix" xml:lang="so">
<term>DiLAF</term>
<term>LMF</term>
<term>XML</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Create an electronic dictionary from scratch is an expensive job because this task mobilizes over a long period, the work of skilled contributors, if not in lexicology, at least in linguistics. The use of specialized computer tools is essential for resources used by programs in natural language processing. When the socio-economic environment does not gather the necessary resources to the drafting of an electronic dictionary and printed dictionaries exist, these dictionaries are an important resource that can be used to initialize the creation of electronic lexical resources. This paper presents theoretical and practical aspects concerning the conversion of publishing dictionaries to electronic lexical resources. It takes into account the issue of limited economic resources, technology and the availability of qualified persons. Our field experiments concerns under-resourced languages mainly in Southeast Asia (Khmer, Malay, Vietnamese) and the Sahel (Bambara, Hausa, Kanuri, Tamajaq, Zarma), as most of the examples and socio-linguistic situations described in the paper relate to these areas. After a brief history devoted to the formats of electronic dictionaries (SGML, XML, XSLT and CSS), we present two standards that are dedicated to them (Text Encoding Initiative and Lexical Markup Framework). The issue of under-resourced languages is exposed and is followed by some examples concerning published dictionaries. The main technical challenges are detailed like the lack of standardization of the alphabets used and special characters (outside the traditional latin range). The conversion methodology is outlined and then detailed. The conversion to a bridge format in XML can be done by regular expressions or using specialized tools. Then, the bridge format is converted into the target format in LMF. The last part is dedicated to the consultation of resources through an online platform resource management.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
</country>
<region>
<li>Auvergne-Rhône-Alpes</li>
<li>Pays de la Loire</li>
<li>Rhône-Alpes</li>
</region>
<settlement>
<li>Grenoble</li>
<li>Nantes</li>
</settlement>
<orgName>
<li>Université Joseph Fourier</li>
<li>Université de Grenoble</li>
<li>Université de Nantes</li>
</orgName>
</list>
<tree>
<country name="France">
<region name="Auvergne-Rhône-Alpes">
<name sortKey="Mangeot, Mathieu" sort="Mangeot, Mathieu" uniqKey="Mangeot M" first="Mathieu" last="Mangeot">Mathieu Mangeot</name>
</region>
<name sortKey="Enguehard, Chantal" sort="Enguehard, Chantal" uniqKey="Enguehard C" first="Chantal" last="Enguehard">Chantal Enguehard</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Ticri/explor/TeiVM2/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000015 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000015 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Ticri
   |area=    TeiVM2
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Hal:hal-00959229
   |texte=   Des dictionnaires éditoriaux aux représentations XML standardisées
}}

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Mon Oct 30 21:59:18 2017. Site generation: Sun Feb 11 23:16:06 2024